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Abstract 

The successive projection algorithm (SPA) has been known to work well for separable non¬ 
negative matrix factorization (NMF) problems arising in applications, such as topic extraction 
from documents and endmember detection in hyperspectral images. One of the reasons is in 
that the algorithm is robust to noise. Gillis and Vavasis showed in [SIAM J. Optim., 25(1), 
pp. 677-698, 2015] that a preconditioner can further enhance its noise robustness. The proof 
rested on the condition that the dimension d and factorization rank r in the separable NMF 
problem coincide with each other. However, it may be unrealistic to expect that the condition 
holds in separable NMF problems appearing in actual applications; in such problems, d is usu¬ 
ally greater than r. This paper shows, without the condition d = r, that the preconditioned 
SPA is robust to noise. 

Keywords: nonnegative matrix factorization, separability, successive projection algorithm, 
robustness to noise, preconditioning 
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1 Introduction 

A d-hy-m nonnegative matrix A is said to be separable if it has a decomposition of the form 

A = FW for F e and W = (/, K)n e ™ (1) 

where / is an r-by-r identity matrix, K is an r-by-(m — r) nonnegative matrix, and 11 is an 
m-by-m permutation matrix. Here, we call F the basis matrix of A and r the factorization rank. 
The separable nonnegative matrix factorization problem is stated as follows. 

(Separable NMF Problem) Let A be of the form given in (1). Find an index set 
I with r elements such that A{X) coincides with the basis matrix F. 

The notation A{X) denotes the submatrix of A whose column indices are in X; in other words, 
A{X) = (ttj : i S X) for the ith column vector of A. We use the abbreviation NMF to refer 
to nonnegative matrix factorization. The problem above can be thought of as a special case of 
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NMF problem. The NMF problem is intractable, and in fact, was shown to be NP-hard in [13]. 
The authors of [3] proposed to put an assumption, called separability. The separability assumption 
turns it into a tractable problem referred to as a separable NMF problem. Although the assumption 
may restrict the range of applications, it is known that separable NMF problems nonetheless can 
be used for the purpose of topic extraction from documents [4, 2, 11] and endmember detection in 
hyperspectral images [7, 8]. 

Several algorithms have been developed for solving the separable NMF problem. One of our 
concerns is how robust these algorithms are to noise, since it is reasonable to suppose that the sep¬ 
arable matrix contains noise in separable NMF problems arising from the applications mentioned 
above. We consider an algorithm for solving a separable NMF problem and suppose that the 
separable matrix contains noise. If the algorithm can identify a matrix close to the basis matrix, 
we say that it is robust to noise. 

The successive projection algorithm (SPA) was originally proposed in [1] in the context of 
chemometrics. Currently, the algorithm and its variants are used for topic modeling, document 
clustering and hyperspectral image unmixing. Gillis and Vavasis showed in [7] that SPA is robust 
to noise and presented empirical results suggesting that the algorithm is a promising approach to 
hyperspectral image unmixing. The theoretical results implied that further improvement in noise 
robustness can be expected if we can make the condition number of the basis matrix smaller. Hence, 
they proposed in [8] to use a preconditioning matrix for reducing the condition number of the basis 
matrix. They showed that the noise robustness of SPA is improved by using a preconditioner. The 
proof rested on the condition that the dimension d and factorization rank r in a separable matrix 
coincide with each other. However, it may be unrealistic to expect that the condition holds 
in separable NMF problems derived from actual applications. In such a situation, d is usually 
greater than r. For instance, we shall consider extraction of topics from a collection of newspaper 
articles. This task can be modeled as a separable NMF problem. In the problem, the dimension 
d and factorization rank r of a separable matrix correspond to the number of articles and topics, 
respectively. It would be rare that d is close to r but usual that d is greater than r. 

The aim of this paper is to show, without the condition d = r, that the preconditioned SPA is 
robust to noise. The statement of our result is in Theorem 3. It can be used as a guide for seeing 
how robust the algorithm is to noise when handling separable NMF problems derived from actual 
applications. 

The rest of this paper is organized as follows. In Section 2, we review SPA and the precon¬ 
ditioned one and describe the results of the noise robustness obtained in a series of studies by 
Gillis and Vavasis. After that, we describe our analysis of the preconditioned SPA, comparing our 
results with those of Gillis and Vavasis. Our analysis is shown in Section 3. 


1.1 Notation and Terminology 

A real matrix is said to be nonnegative if all of its elements are nonnegative. Here, we use the 
symbol to represent the set of d-hy-m real matrices, and the set of d-hy-m nonnegative 

matrices. The identity matrix is denoted by I and the permutation matrix by II. The vector of 
all ones is denoted by e and the ith unit vector by e*. We shall use the capital upper-case letter A 
to denote a matrix. The lower-case letter with subscript a* indicates the ith column. We denote 
the transpose by A^, the rank by rank(A) and the matrix norm by ||A||. In particular, the matrix 
2-norm and the Frobenius norm are written as ||A ||2 and ||A||i7’. We use the symbol {A;B) for 
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matrices A e and B e 


to represent the matrix 


A 

B 




and the symbol diag(ai,..., am) for numbers ai,..., am to represent the diagonal matrix 

/ ai 

\ ^rr 

For two real numbers a and b, the symbol min(a, b) indicates the smaller value. 


1.2 Tools from Linear Algebra 

Any real and complex matrix has a singular value decomposition (SVD). We will use the SVD of 
a real matrix in our subsequent discussion. Let A G The SVD of A can be written as 

A = U'EV~^. (2) 

U and V are d-hy-d and m-by-m orthogonal matrices. In particular, the column vectors of U 
and V are called the left singular vectors and right singular vectors of A. S is a d-by-m diagonal 
matrix. If d < m, it has the form (diag((Ji,..., ad), 0) for a d-by-(m — d) zero matrix 0; otherwise, 
(diag(cri,..., dm); 0) for a (d — m)-by-m zero matrix 0. Let t = min(d, m). It is known that 
the diagonal elements ai,...,at are all nonnegative. These elements are called singular values 
of A. By changing the order of columns in U and V, we can arrange the singular values in 
descending order. Therefore, throughout this paper, we always assume that di > • • • > di in 
X). We use the symbols dniin(A) and dmax(A) to denote the smallest and largest singular values 
among them; in other words, dmin(A) = at and dmax(A) = di. We define the condition number of 
A as dniax(A)/dmin(A), and use n{A) to denote it. 

Let A be a d-by-d symmetric positive definite matrix. Due to the positive definiteness, A has 
an eigenvalue decomposition such that A = UAU~^ where U is a d-by-d orthogonal matrix, and 
A is a d-by-d diagonal matrix with positive elements Ai,..., A^. We define the square root of A 
as where = diag(A}^^,... ,X]/'^), and use A^/^ to denote it. We use the symbol 

A 0 to mean that A is positive definite. 


2 Noise Robustness of the Preconditioned SPA for Separable 
NMF Problems 

This section consists of three subsections. We start by examining separable NMF problems from 
a geometric point of view and summarize each step of SPA in Algorithm 1. The geometric inter¬ 
pretation of the problems will help us to understand the notion behind SPA. The analysis of the 
noise robustness of SPA by [7] is described in Theorem 1. Next, assuming the condition d = r, we 
intuitively explain why the noise robustness of SPA can be enhanced by using a preconditioning 
matrix. Then we describe the result for the preconditioned SPA by [8] in Theorem 2. We sum¬ 
marize the algorithm of preconditioned SPA in Algorithm 2, which works without the condition 
d = r. Finally, we describe our results on the robustness of Algorithm 2 to noise and compare 
them with the results of Theorem 2 of [8]. 
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2.1 Review of SPA 


Separable NMF problems have a geometric interpretation. Let A be of the form given in (1). 
Without loss of generality, we can assume that any column vector fcj of A satisfies e~^ki = 1, since 
A = FW ADi = FD 2 D 2 ^WDi for nonsingular diagonal matrices Di and D 2 . In addition, 
assume that rank(P) = r. Under these assumptions, the convex hull of the column vectors of A 
is an (r — l)-dimensional simplex in W^, and the vertex corresponds to each column vector of F. 
Accordingly, we can restate the separable NMF problem as follows; hnd all vertices of the convex 
hull of the column vectors of A. In [7, 8, II], we can find a further explanation of the problem. 

SPA is designed on the basis of the geometric interpretation of separable NMF problems. The 
hrst step hnds among the column vectors ai,..., of A that maximizes the convex function 
f{x) = 11*112 and projects ai,... , onto the orthogonal space to ai*. This procedure is repeated 
until r column vectors are found. As pointed out in [7], SPA has a connection to QR factorization 
with column pivoting by [5]. Algorithm 1 describes each step of SPA. We may see why SPA can 
find F from A by recalling the following property; given the set of points in a polytope, including 
all the vertices, the maximum of a strongly convex function over the set is attained at one of the 
vertices. 


Algorithm 1 SPA 

Input: A d-hy-m real matrix A and a positive integer r. 

Output: An index set I. 

1: Initialize a matrix S as S A, and an index set X as X ■(— 0. 

2: Find an index i* such that i* = argmaxj=i^...^m II111 I^e column vector Sj of S. 
3: Set t Si*. Update S as 


S ^ 




S, 


and X as 


x^xu{U}. 


4: Go back to step 2 if jXj < r; otherwise, output X, and then terminate. 


Now let us describe the analysis of Algorithm 1 given by Gillis and Vavasis in [7]. We put the 
following assumption on a d-by-m real matrix A. 

Assumption 1. A can be decomposed into A = FW for F G and W = {I,K)Il G 

where I, K and II are the same as those of (1). F and the column vector ki of K satisfy the 
following conditions. 

(a) rank{F) = r. 

(b) ki < 1 for all i = 1,... ,m — r. 

Assumption 1 corresponds to that made in [7, 8]. Note that the decomposition of A in the 
assumption is not exactly the same as that of (1). F in (1) is a nonnegative matrix but F in the 
assumption is not necessarily a nonnegative one. As we mentioned in the first part of this section, 
from the relation A = FW ADi = FD 2 Df^WDi for nonsingular diagonal matrices Di 
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and £> 2 , Assumption 1(b) can be assumed without loss of generality. Under Assumption 1, Gillis 
and Vavasis showed in [7] that Algorithm 1 on the input (A,r) returns Z such that A{Z) = F. 
Furthermore, they showed that it is robust to noise. Suppose that a separable matrix A of (1) 
contains a noise matrix N G such that 


A = A + N. 


(3) 


We call A a near-separable matrix, and N a noise matrix. Their analysis tells us that running 
Algorithm 1 on the input {A, r) returns Z such that A{Z) is close to F if the size of N is small. 
The formal statement is as follows. 


Theorem 1 (Theorem 3 of [7]). Let A = A -\- N for A G and N G Suppose that 

r >2 and A satisfies Assumption 1. If rii of N satisfies ||nj ||2 < e for all i = 1,... ,m with 


€ < min 


1 A (TminiF) 

1 + 80k(F)2’ 


then, Algorithm 1 with the input [A, r) returns the output Z such that there is an order of the 
elements in Z satisfying 

\\axij)- fjh < (l + 80K(F)2)e 

for all j = l,...,r. 


The notation Z{j) represents the jth. element of Z for a set Z whose elements are arranged in 
some order. Throughout this paper, we will use the notation Z{j) to refer to the jth element in 
the ordered elements of Z. In the theorem, is the X(j)th column vector of A, and fj the 

jth column vector of F. The statement of the above theorem does not completely match that of 
Theorem 3 of [7]. Let us remark on that. 

Remark 1. Theorem 3 of [7] is described by using L, pL, and K{F) that are not found in the above 
theorem. Let / be a strongly convex function. Then, L corresponds to the Lipschitz constant of 
/, and p, is the parameter associated with the strong convexity of /. We have L = p since we 
consider the case in which f{x) = ||®|| 2 - K{F) is dehned as K{F) = maxj=i^...^r ll/jlb- From the 
dehnition, we have K{F) < (Tmax(F). Therefore, the above theorem follows from Theorem 3 of 

[7]. 


2.2 Preconditioned SPA 

Consider a near-separable matrix A of (3). Theorem 1 suggests that, if one restricts the condition 
number of the basis matrix F to be close to one, we may expect that the allowed range size of 
||nj ||2 increases and the difference between F and A{Z) decreases. Assume that A of A satisfies 
Assumption 1. Let Q be a d-hy-d nonsingular matrix. Then, the multiplication of A by Q yields 
QA = QFW + QN. The assumption still remains valid for QF due to the nonsingularity of 
Q. Accordingly, if we can construct Q so as to decrease the condition number of F, the noise 
robustness of SPA may be improved by performing SPA on the input {QA,r) instead of (A, r). 
In [8], Gillis and Vavasis proposed a procedure for constructing such a preconditioning matrix Q. 
Here, we should pay attention to the fact that, even if Q decreases the condition number of F, 
the amount of noise could be expanded up to factor ||Q|| since ||QAr|| < ||Q||||Ar||. 
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2.2.1 Case of d = r 


Now let us explain the procedure for constructing a preconditioning matrix in [8]. For simplicity, 
we will consider the noiseless case on A. That is, A = A. We assume that A satishes Assumption 
1, and in addition, assume that the dimension d and the factorization rank r coincide with each 
other. Under these assumptions, A is an r-by-m separable matrix and has an r-by-r basis matrix 
F. We set S = {ai, ..., a^} for the column vectors ai, ..., of A, and consider the optimization 
problem, 

P(5) : minimize —logdet(il), 

subject to aJLa < 1 for all a G 5, 

T ^ 0. 

L is the decision variable. P(5) corresponds to the formulation of computing the minimum volume 
enclosing ellipsoid (MVEE) centered at the origin for S. It has been shown in [11, 8] that the 
optimal solution L* is given by . Therefore, can be used for the preconditioning 

matrix in order to improve the condition number of F, since = k{F'^L*F) = 

k{I) = 1. Next, we consider the noisy case on A. Let L* be the optimal solution of P(iS) for 
S = {ai,...,am} where ai,... ,am are the column vectors of A. In this case, L* does not 
completely match {FF~^)~^, but the difference between these two matrices is thought to be small 
if the amount of noise is also small. Therefore, could serve as a preconditioning matrix for 

restricting the condition number of F. 

We may need to add a further explanation of P(5). The origin-centered MVEE for the points 
in {±a : a G 5} is given as {a: G L*x < 1} where L* is the optimal solution. The volume 

of the MVEE is c(r)/ y/ det(L*) where c(r) is the volume of a unit ball in M'’ and a real number 
depending on the dimension r. Since rank(A) = r due to Assumption 1(a), the convex hull of the 
points in {±a : a G 5} is full-dimensional in and thus, the MVEE has a positive volume. P(iS) 
is a convex optimization problem. Efficient algorithms such as interior-point algorithms and the 
Erank-Wolfe algorithms have been developed and are now available for solving it; see, for instance, 
[10, 12] for the details on these algorithms. 

Gillis and Vavasis showed in [8] that the preconditioner makes it possible to improve 

the noise robustness of SPA under Assumption 1 and d = r. Here is their result. 

Theorem 2 (Theorem 2.9 of [8]). Let A = A + N for A G and N G Suppose that 

A satisfies Assumption 1 and the eondition d = r. Let L* be the optimal solution of P(5) where 
S = {Si,..., Sm} for Si,..., Sm of A. If rii of N satisfies ||nj ||2 < e for all i = 1,... ,m with 

e<o( 

y r^/r I 

then, Algorithm 1 with the input ((il*)^/^A,r) returns the output X such that the size of the basis 
error of I is up to 0{K{F)e). 

The “size of the basis error” in the above statement should be clearly explained. Let X be a 
subset of {1,..., m} with r elements, and suppose the elements are arranged in some order. Given 
a near-separable matrix A of (3), the size of the basis error of I is 

.max \\ax{j) - fjh- 

Gillis and Ma in [6] developed other types of preconditioning matrices for SPA and analyzed 
the noise robustness. 
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2.2.2 Case of d / r 


The discussion in the previous section was made under Assumption 1 and the condition d = r. 
We shall consider the usual situation in which a near-separable matrix A of (3) has d ^ r. The 
following approach was suggested in [8] for handling this situation. SVD plays a key role. SVD 
decomposes A into A = U'SV~^ where U, V and S are the same as those of (2). By using 51, we 
construct a d-by-m diagonal matrix 51^ such that 

^ f (diag((Ti,...,a,.,0, ...,0),0) if d < m, 

\ (diag((Ti,..., (Tj,, 0,... , 0); 0) otherwise. 

Let A'’ = . This is the best rank-r approximation matrix to A under the matrix 2-norm 

or the Frobenius norm. Also, note that A = A^ holds if A does not contain N. We construct a 
matrix P £ such that 

^ = U~^A^, equivalently, P = diag(cri,... ,ar){V^)~^ (5) 

where = (ui,..., v^) £ for the column vectors vi,... ,Vr of V. As we will see in Section 

3.1, P can be thought of as an r-by-m near-separable matrix having an r-by-r basis matrix. 
Therefore, we can apply the discussion in the previous section to P. Let S = {pi,... ,Pm} for P. 
We compute the optimal solution L* of P(5) and run Algorithm 1 on r). Algorithm 

2 summarizes each step of the preconditioned SPA. The description is almost the same as that of 
Algorithm 2 of [8]. 



Algorithm 2 Preconditioned SPA 

Input: A d-by-m real matrix A and a positive integer r. 

Output: An index set X. 

1: Compute the SVD of A. Let cri,..., Ur be the top r largest singular values, and vi, ... ,Vr G 
M™' be the corresponding right singular vectors. Construct P = diag(cJi,..., £ 

^rxm yr ^ 

2: Compute the optimal solution L* of P(5) for S = {pi,... ,Pm} where pi,... ,Pm are the 
column vectors of P. 

3: Construct P° = Run Algorithm 1 on the input (P°,r), and output the index set 

I obtained by the algorithm. 


2.3 Our Result and Its Comparison with Theorem 2 by Gillis and Vavasis 

Gillis and Vavasis in [8] showed empirical results, suggesting that Algorithm 2 can improve the 
noise robustness of SPA. However, a formal analysis was not given. Here, we give it. 

Theorem 3. Let A = A + N for A £ and N £ Suppose that r >2 and A satisfies 

Assumption 1. If N satisfies ||W ||2 = e with 

^ - 1225Vr ’ 
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then, Algorithm 2 with the input (A, r) returns the output X such that there is an order of the 
elements in I satisfying 

~ fjh < (432«:(F) +4)e 

for all j = 1,... ,r. 


The proof is given in Section 3. Note that in this paper we will use the notation e to describe the 
size of ||Ar ||2 or ||nj ||2 for the noise matrix AT of a near-separable one. Let us compare our result, 
Theorem 3, with that of Gillis and Vavasis, Theorem 2. The main advantage of ours is in that it 
ensures the noise robustness of the preconditioned SPA for separable NMF problems without the 
condition d = r, while their result ensures it under that condition. The dimension d is usually 
greater than the factorization rank r in separable NMF problems derived from actual applications 
such as topic extraction from documents [4, 2, 11] and endmember detection in hyperspectral 
images [7, 8]. Therefore, our result can be used as a guide for seeing how robust the preconditioned 
SPA is to noise when handling such applications. 

As we will see in Section 3, P in step 1 of Algorithm 2 is an r-by-m near-separable matrix 
having an r-by-r basis matrix. Furthermore, Assumption 1 holds for the basis matrix of P, if the 
amount of noise involved in an input matrix is small. Therefore, Theorem 2 can apply to P, and 
this implies a similar result to ours. 

Proposition Let A = A + N for A G Suppose that Assumption 1 

holds for A in A. If N satisfies ||Ar ||2 = e with 


e <0\ 


(J Ti 


^{F) 


r^/r 


then, Algorithm 2 with the input (A, r) returns the output X such that the size of the basis error 
ofX is up to 0{K{F)e). 


The proof is given in Section 3. Although the size of the basis error in Proposition 1 is of the 
same order as ours, the allowable amount of noise is worse by a factor 1/r. 

Our allowed noise range is described using the norm of N, while theirs is described by the 
norm of the column vectors n*. Hence, we shall rewrite our result in terms of the norm of rij. By 
taking account of the fact that IIAIII 2 < ll^ilb for N € Theorem 3 implies 

the following corollary. 

Corollary 1. Let A = A + N for A G and N G Suppose the same conditions in 

Theorem 3 hold. If rii of N satisfies ||n.j ||2 < e for all i = 1,... ,m with 


e < O 



then. Algorithm 2 with the input (A, r) returns the output X such that the size of the basis error 
ofX is up to 0{K{F)e). 


We see that Ijy/ni emerges in the description of allowed noise range. When handling separable 
NMF problems from actual applications, it would be reasonable to suppose a situation where m 
corresponds to the number of data points, and it could be large. Therefore, the allowed noise 
range of our result becomes weaker in such a situation. 
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3 Analysis of the Noise Robustness of Algorithm 2 

The main goal of this section is to prove Theorem 3. In the discussion of the proof, we will see 
that Theorem 2 implies Proposition 1. 


3.1 Preliminaries 

Let A be of the form given in (3). We shall consider Algorithm 2 on the input data (A, r). Step 1 
computes the SVD of A, and decomposes it into A = U'SV~^ where U, V and S are the same as 
those of (2). The rank-r approximation matrix A'’ is given as A^ = by using of (4). 

We denote A — A’’ by Then, A can be represented as 

A = A^ + A'-’-^ (6) 

by using A'’ and A’’’'^ such that 

A’' = and A'’-'^ = 


Here, we let = E - SL 

P in step 1 of Algorithm 2 is given as (5). Using relation (6), it can be rewritten as 


(P;0) = (7) 

= t/^(A-A^’=) 

= t/^(A +AT-A^’^) 

= U^iA + N) 

= U^{{F,FK)U + N) 

= U^iF + N^^\FK + 

= U^{F,FK + N)n. 

In the above, we have used the notation TV G iv(i) G iv(2) g F g 

and N G such that 


N 

F 

N 


N - A"’", 

ivn\ 

F + N^^\ 

K + 


Accordingly, P can be represented as 

P = {G,GK + S)n 

by using G G and S G such that 

anda^iv=(«), 

P° in the step 3 of Algorithm 2 is 

P° = {G°,G°K + s°)n 


( 8 ) 

(9) 

( 10 ) 


( 11 ) 


(12) 
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by using G° G and S° G such that 

G° = {L*y/^G and S° = {L*)^/^S. 

Hence, P and P° are near-separable matrices, and {G,GK)Il and {G°,G°K)'n correspond to 
the separable matrices. In particular, G and G° are the basis matrices, and (o,s)n and (o,s°)n 
are the noise matrices. It should be noted that P and P° are r-by-m near-separable matrices and 
these have r-by-r basis matrices. 

3.2 Proof of Proposition 1 

Here, we prove several lemmas that will be necessary for the subsequent discussion. Similar 
statements have already been proven in [11]. More precisely. Lemmas 2, 3 and 4 correspond to 
(a), (b) and (c) of Lemma 7 of that paper, and we have included them here to make the discussion 
self-contained. After that, we prove Proposition 1. The proof is obtained from Theorem 2 together 
with the following lemmas. 

Lemma 1. Let A = A + N G Then, \ai{A) — ai{A)\ < ||A /'||2 for each i = 1,... ,t where 

t = min((i, m). 

Proof. See Corollary 8.6.2 of [9]. □ 

Lemma 2 (Lemma 7(a) of [11]). 11^1112 < 2||A7'||2. 

Proof. ||A /'||2 = IIAI — A ''’‘^||2 < ||A7'||2 -|- ||A ''’'^||2 since N is of the form (8). Also, ||A ^’‘^||2 = 
o'max(A'’’'^) = cJr+i(A). From Lemma 1 and ar+i{A) = 0, we have crr+i(A) < ||A7'||2. Thus, 
||AI||2 ^ 2||AI||2. fH 

Lemma 3 (Lemma 7(b) of [11]). Let Si he the column vector of S in (11). Suppose that K satisfies 
Assumption 1(b). Then, ||sj ||2 < 4||A/'||2/or eac/i i = 1,..., m — r. 

Proof. We see from (12) and (10) that (si;0) = U~^fii and rii = + fif‘\ Here, fcj and 

are the column vectors of K and respectively. Thus, ||si ||2 = || — + 7ip^||2 < 

|| 2 ||^j ||2 + ||^Pp 2 - From Lemma 2 and ||fci ||2 < 1 due to Assumption 1(b), we have ||sj ||2 < 

4||ai||2. □ 

Lemma 4 (Lemma 7(c) of [11]). \rrj{F) — o'j{G)\ < 2||AI||2 for each j = 1,... ,r. 

Proof. We see from (12) that G and F has the relation {G; 0) = F. Since U is an orthogonal 
matrix, the singular values of G coincide with those of F. Also, F is of the form (9). Thus, from 
Lemma 1, we have \aj{F) — aj{G)\ = \aj{F) — (Tj{F + < ||A7'(^)||2 < 2||A7'||2. The last 

inequality follows from Lemma 2. □ 

Lemma 5. Let and Pk be the column vectors of A and P. Also, let fj and Qj be those of F 
and G. We have \\ak — fj \\2 < \\pk — fl'jib + 3||A/'||2 for any k and any j in {1,..., r}. 
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Proof. We see from (7) and (6) that and pk are related as 


Pk 

0 


= U^al 


and ttfc = al + a\ 


where and a^’'^ are the column vectors of and Also, from (12) and (9), Qj and fj are 
related as 

( 0^ ) = and fj = fj + 

where fj and are the column vectors of F and Therefore, 


||afc-/il|2 = \\{al + ai;^)-{fj-nf^)\\2 

= fj) + U^{a''^^ + nf'>)\\2 

< WU^iUl - f,)\\2 + ||[/^sri|2 + \\U^nf'>\\2 
= \\Pk - Ojh + \\a]:% + 11^^^lb- 

[/ is a d-by-d orthogonal matrix in (7) that consists of the left singular vectors of A. By Lemma 
2 , we can put an upper bound on the norm of such that < ||- 1 V ||2 < 2 ||A?'|| 2 . Also, we 

can put an upper bound on the norm of such that ||a ^’'^||2 < || A^lb due to ||a ^’'^||2 < || < 

|| 7 V|| 2 . The last inequality is obtained in the same way as in the proof of Lemma 2. Therefore, we 
have IlSfc - fjh < \\Pk - i/jlb + 3||ATlb- □ 


We are now ready to prove Proposition 1. 


(Proof of Proposion 1). We show that Theorem 2 can apply to P in step 1 of Algorithm 2, if the 
amount of noise ||-/V|b is smaller than some level. P can be written as (11), and hence, is an 
r-by-m near-separable matrix having an r-by-r basis matrix G. We choose some real number 7 
such that 7 > 2 . Suppose that HAflb < ^cr min lT'). It follows from Lemma 4 and Assumption 1(a) 

that crniin(G) > -^^CF.[ain{F) > 0. Therefore, Assumption 1 holds for G. This means that Theorem 
2 can apply to P. Its application leads to the following statement. Let L* be the optimal solution 
of P(iS) where S = {pi,... ,Pm} for P- If of S satisfies ||sj|b ^ r for all f = 1 ,... ,m — r with 
e < 0((7jnin(G)/r\A’), then. Algorithm 1 with ((L*)^/^P, r) returns the output Z such that there 
is an order of the elements in X satisfying ||pxQ) “ fifjlb ^ 0{K{G)e) for j = 1 ,..., r. 


Suppose that 


N\\2 < 


(TrainiF) 

Aary/r + 2 


for some a > 1. Note also that ||IV|b < ^crmin(.f")- From Lemmas 3 and 4, 
°'min(-^) 2||Ar||2 ^ ||Ar|b is supposed to Satisfy (13), we have 

ar^r — avy/r " ^ ^ 


(13) 

< 4||Ar|b and 


Isilb <4||AI|b < 


O'min(-F') - 2||A/'| 
ar^/r 


O'min(G) 

ar^ 
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This gives ||sj ||2 < Also, from Lemma 4, 

f^max(G^) 

O’min(G') 

C’'max(-f’) + 2||A/'||2 

q'min(-T') — 2||A7'||2 

i 1 + - ^—F=] ^ 

\ 2ar-yr J 2ary/r 

2k,{F) + 1. 

Therefore, from Lemmas 3 and 5, we have 

\\ax{j) - fjh - 3||A^||2 < \\Px{j) - 9jh < li>^{G)W lb < (8/3 k(F) + 4/3)||Ar||2 

for some /3 > 0 where let i* = argmaxj=i^,..^m-r ll^ilb- Consequently, if \\N \\2 satisfies (13), then. 
Theorem 2 can apply to P, and the application implies that \\ax{j)~ fj \\2 < (8/3At(-F')+4/3+3)||A7'||2. 

□ 


KiG) = 
< 


< 


3.3 Proof of Theorem 3 

The core part of the proof of Theorem 3 is to show that Theorem 1 can apply to P° if ||Ar ||2 is 
small. To do this, we evaluate the upper bound on the condition number of G°. For the subsequent 
discussion, we need the following lemma. 

Lemma 6. Let a be a real constant satisfying a > y/2 and r he any integer satisfying r > 2. 
Suppose that ||Ar ||2 < K satisfies Assumption 1(b). Then, 

(a) < CTminiG) < amaxiG) < (TmaxiF) + 

(b) \\si\\ 2 <^^^^ fori = l,...,m-r. 

Proof. Statement (a) follows from Lemma 4, and that (b) follows from Lemmas 3 and (a). □ 

3.3.1 Upper Bound on the Condition Number of G° 

We show that the condition number of G° is bounded from above by a real constant, if ||iV ||2 is 
smaller than some level. This can be proved by applying Theorem 2.8 in [ 8 ] to P and by taking 
into account the discussion in the proof of Proposition 1. This implies that such an upper bound 
is obtained under ||Ar ||2 < 0((Tniin(F)/^\A’)- Meanwhile, we will provide a similar bound under 
||N lb < 0{amin{F)/^/r). This result can increase the allowable amount of noise 11^112 by a factor 
1 /r over that of the application of the theorem. 

Our derivation follows that of [ 8 ]. More precisely, it will be shown through Lemmas 7 and 8 and 
Proposition 2, which have the following correspondence to the lemmas of that paper; Lemma 7 is 
Lemma 2.4, Lemma 8 is Lemmas 2.5 and 2.6, and Proposition 2 is Lemmas 2.6 and 2.7. Although 
a major part of our proof of the lemmas and proposition relies on the techniques developed in 
that paper, there are some differences. In particular, we use an alternate technique to prove 
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Proposition 2, and it allows us to derive the upper bound on the condition number of G° under 
ll-^lb < 0(<7min(-P')/\A’)- Remark 2 details the differences between the proofs. 

Now let us look at problem P(5) in step 2 of Algorithm 2. Note that S = {pi,... ,Pm} for the 
column vectors pi,... ,Pm of P. For convenience, we will change the variable to P(5) such that 
C = LG for a nonsingular matrix G and consider the problem, 

Q(5) : minimize — logdet(C) + 21ogdet(G), 

subject to p~^ {G~^YCG~^p < 1 for all p G S, 

CyO, 


which is equivalent to P(5) under the nonsingular transformation G. C is the decision variable. 
Let C* be the optimal solution and Xj denote the jth eigenvalue of C*. We will continue to use 
the notation \j for this purpose throughout this section. For the jth singular value aj of G°, we 
have aj = for j = 1 ,..., r since {G°YG° = G^L*G = C*. 

In Lemmas 7 and 8, we evaluate the lower and upper bounds on det(C'*) by using r and Xj. 
These bounds give the inequality that r and Xj need to satisfy. In Proposition 2, by using the 
inequality, we derive the lower and upper bounds on Xj whose square root is equal to the singular 
value aj of G°. Lemma 7 is almost the same as Lemma 2.4 of [8]. 

Lemma 7. Let a be a real constant satisfying a > \/2 and r be any integer satisfying r > 2. 
Suppose that ||Ar ||2 < ^ satisfies Assumption 1(h). Then, 


det(C*) > 


( \ 2r 
ay/r - 2 \ 

ay/r + 2 j 


Proof. For an r-by-r scaled identity matrix 61 with a positive real number 9, we derive the upper 
bound on 9 such that 91 is feasible for Q(5). Since P can be written as (11), S contains two 
different types of vectors: one is pj and the other is Gki + Sj where pj is the column vector of G 
and ki and Sj are those of K and S. Therefore, 91 needs to satisfy two types of constraints, 

9pJ{G~^y IG~^Pj < 1 for j = 1 ,..., r, 

9{Gki + SiY{G-^yIG-^{Gki + Si) < 1 for i = 1,..., m - r. 


The hrst constraints hold \i 9 < 1. For the second constraints, we have 


{Gk, +Si)'^{G^^)'IG-^{Gki +Si) = \\ki + G-^s 


-UT 


< 

< 


ill2 


ki\\2 + IIG 

4 


bll^ilb)^ 


1 + 


ay/f - 2 


2 

ay/r - 2 


The second inequality follows from Lemma 6 and also ||fci ||2 < 1 due to Assumption 1(b). Thus, 
the second constraints hold if 9 < Let 9 = It satishes 0 < 0 < 1 because of 

— \ay/r-\-2y kayr+z/ 

a^/r > 2. Thus, 91 is a feasible solution of Q(5). Accordingly, for the optimal solution C* and 
the feasible solution 91, we have 


det(C*) > det{91) = 9^ 


( \ 2r 

a^/r - 2 \ 

a^/r+ 2 j 
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□ 


Lemma 8 corresponds to Lemmas 2.5 and 2.6 of [ 8 ]. The lemmas of that paper need to put 
a condition on the amount of noise in order to derive of the upper bound on det(C'*), while this 
lemma does not need to do so. This comes from the difference in the structures of near-separable 
matrices. Our lemma handles a near-separable matrix of the form (11). It has a preferable structure 
wherein the noise matrix contains an r-by-r zero submatrix and this zero submatrix corresponds 
to a basis matrix. 


Lemma 8. Suppose that r >2. Then, 


det(C*) < 


r - Aj 
r — 1 


r—1 




for each j = 1 ,..., r. 

Proof. We derive an upper bound on the sum of the eigenvalues of C* that is equivalent to tr(C*). 
Since C* is feasible for Q(5), we have 

gj< 1 , equivalently, II 2 < 1 for j = 1 ,...,r. 

Thus, 


Ai Aj. 


tr(C*) 

i=i 


The arithmetic-geometric mean inequality means that (ai x • • • x < (ai ar)/r holds 

for nonnegative real numbers ai,..., a^. Therefore, we have, for each j = 1,..., r. 


det(C'*) = Ai X ••• X A^ < 


r - Aj 
r — 1 


- - 1 


Xj. 


□ 


We denote 


/ aV2-2 

\aV2 + 2 


The value of a is in 0 < a < 1 when a > \f2. 


(14) 


Proposition 2. Let a be a real constant satisfying a > \f2 and r be any integer satisfying r >2. 
Suppose that ||Ar ||2 < o-n-d K satisfies Assumption 1(h). Then, Xj is bounded such that 

1 — \/l — o < Aj < 1 -|- y/l — a for each j = 1,... ,r. 
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Proof. Lemmas 7 and 8 tells us that r and Xj need to satisfy the inequality 


( aVf-2 y\ 

ya^/r + lj ~ 


r — 1 J 


r—1 


Xj. 


(15) 


When r = 2, it becomes o < (2 — Xj)Xj. Thus, it is necessary for Xj to satisfy 1 — ^/l — a <A,< 
1 + \/l — a. □ 

Remark 2. Proposition 2 corresponds to Lemmas 2.6 and 2.7 of [8]. Prom the lower and upper 
bounds on det(C'*), the lemmas of that paper construct an inequality that r and Xj need to satisfy 
and determine the condition on Xj such that the inequality holds for all r > 2. In contrast, this 
proposition only considers the case of r = 2 for inequality (15), and determines the condition on 
Xj. 


1 /2 

Since we have aj = Xj for the jth singular value aj of G°, this proposition gives the bounds 


on the singular values and condition number of G° 


Corollary 2. Suppose that the same conditions in Proposition 2 hold. Then, we have 


0'max(G°) < 


/ ,_\ 1/2 

/ ,_\ 1/2 

f 1 + Vi ~ ^ j ’ 


k(G°) < 


/-\ 1/2 

1 + \/l — a ^ 

1 — — a J 


3.3.2 Application of Theorem 1 to P° 


As we saw in Section 3.1, P° in step 3 of Algorithm 2 is an r-by-m near-separable matrix. In the 
proposition below, we show that Theorem 1 can apply to P°. Here, we should note that s°, Px(j) 
and Qj in the proposition are the column vectors of S°, P° and G°. 

Proposition 3. Let A = A + N for A G _/y G Suppose that r > 2 and A 

satisfies Assumption 1. Let e be such that ||s °||2 < e for all i = 1,... ,m — r. If 


A^lb < 


O' min{P') 


and a = 1225, then, Algorithm 2 with the input (A, r) returns the output X such that there is an 
order of the elements in I satisfying 

||PiO-)-5°||2<(80K(G°)2 + l)e 


for all j = 1,..., r. 


Proof. First, we show that a separable matrix G°{I, K)H in P° satisfies Assumption 1. From 
Corollary 2, we have a ram (G°)> (1 — Vl ~ > 1. The last inequality strictly holds, since the 
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value of a, which is of (14), is in 0 < a < 1 due to a = 1225. Thus, we see that (a) of Assumption 
1 is satisfied. Furthermore, (b) of Assumption 1 is satisfied since the assumption is put on the K 
of A. 

Next, we show that the size of the noise matrix (0, S^°)n in P° is within the range allowed by 
Theorem 1. Namely, we show that the inequality 


II 2 < min 


CFr, 


.{G° 


4 1 + 80 k ( G °)2 


(16) 


holds for i = 1,..., m — r. We derive the upper bound on the left-side value and the lower bound 
on the right-side value. The left-side value is bounded such that 


= \\iL*)^/^GG-h,h 

< ||(i:*)i/ 2 g,||^||g,-i||^||^.||^ 

^ 4(1 +Vr^)V2 

“ Oy/r - 2 


The last inequality follows from Corollary 2 and Lemma 6. The right-side value is bounded such 
that 


min 


1 1^ Crmin(G°) 

2\A^’4j 1 + 80 k(G°)2 


> min 


1 i Vl - 

2y/F^' 4 J 81 + 79^1^ 


> 


1 . / 1 

81 4J l + ^TTT^ 


The hrst inequality follows from Corollary 2. Therefore, the inequality 

1 A (1 - v^r^)3/2 


4(l + Vr^)i/2 1 / 

-;=- < — mm — , , — 

ay/r — 2 81 I 2y/r — 1 4 


1 + VT^. 


<+> 3245 < min 


1 


1 


2^/7^' 4 


{ay/r - 2) 


(17) 


implies that of (16). Here, we denote 


/-\ 3/2 

1 + y/1 — a ^ 

) 


Note that the value of 5 is determined by a, since o is given as (14). 

We show that inequality (17) holds for any r >2 when a = 1225. In the case of 2 < r < 
is sufficient to show 


3245 < 



5, it 
(18) 


In the case of r > 6, the inequality becomes 


3245 < 


ay/r — 2 

2y/7^' 


(19) 
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Let f{x) = ^ for x >2. The function / attains its minimum at x = Thus, we can put 

a lower bound on the right-side value. 


a^/r - 2 
2y/V^ 


> f{o?/^) 

= _4 

1 1 

> 


Accordingly, we see that inequality (18) implies that of (19). For inequality (18) with a = 1225, 
we have the relation 

3246 < 432.4 < 432.6 < - I- 

4 2 

Therefore, inequality (17) holds for any r > 2 when a = 1225. This leads us to conclude that 
inequality (16) holds for i = 1 ,..., m — r. 


□ 


Let us remark on the choice of a = 1225 in the proposition. 


Remark 3. The value of b in (18) is given by the composite function / in x such that / 
with 


fi{x) 


( V2x-2 

yv^x-b2 


4 

and / 2 (x) 


> -\ 3/2 

1 -f Vl - ] 

1 — yjl — X J 


/ 20/1 


The function / is monotonically decreasing for x > 2, and the function value approaches to 1 as x 
goes to infinity. Thus, a = 1225 is the minimum integer that satisfies the inequality (18). 


In Proposition 3, we evaluated the size of the basis error due to Algorithm 2 in terms of P°. 
The proof of Theorem 3 can be obtained by rewriting it as A instead of P°. 


(Proof of Theorem 3). The theorem supposes that r >2 and A satisfies Assumption 1. Therefore, 
Proposition 3 tells us that, if ||/V ||2 < crmin(F')/Q!V^ and a = 1225, then, Algorithm 2 with the 
input {A, r) returns the output I such that there is an order of the elements in I satisfying 


WPxU) ~ 9jh < (80k(G°) -I-1) 


for all j = 1 ,..., r where e satisfies 

Nilb < e. 

The norm of s° is bounded from above by using ||A /||2 such that 

||s °||2 = 

< \\{L*)^/^G\\2\\G-^\\2\\s^\\2 

4(i + ^r^)i/2 


< 


AT 


O'miniG) 

The last inequality follows from Corollary 2 and Lemma 3. We choose e as 

4(1 + 


e = 


O’min(G^) 


N 


2- 


( 20 ) 

( 21 ) 


( 22 ) 
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By this choice, the inequality (21) is satisfied for all i = 1,... ,m — r. In what follows, we shall 
use k to denote I{j) in (20) for simplicity. For the left-side of (20), we have 

-g]h = \\{Ln^/HPk - 9j)h 

> amm{{L*Y^'^)\\pk - gjh 

> a^in{{L*Y^^G)a^UG-^)\\Pk - 9jh 
(1 - 


> 


O’max(G^) 


-\\Pk-9jh- 


The last inequality follows from Corollary 2. For the right side of (20), we have, from Corollary 2 
and the choice of e such as (22), 


(180K(G°)^)e < {1 + 


80(1 + VT^) \ 4(1 + 


1 - 


<7min(G) 


N\ 


Accordingly, 


(1 - vr^)V^ 

<7max(G-) 


\\Pk-9jh < 1 + 


80(1 + \ 4(1 + 


1-VT^ 


<^mm(G) 




„ ^ 4(81 + 79^1^)(l + vT^)^/" 
||Pfc-Sfil|2< --«^(G)II-^I|2. 


(1 - ^/^^) 3/2 

By Lemma 6, the condition number of G is bounded from above by using that of F" such that 


(23) 




Here, we consider the function f(x} = x > 2. Since the function is monotonically 

decreasing for x > 2, we have 


a^/r 
ay/r - 2 
a^/r 
a^/r - 2 


k{F) + 


ay/r - 2 

{k{F) + 1) - 1. 


k{G) < k{F) + 


ay/2 — 2 ay/2 — 2 

By using this inequality, we replace k{G) in (23) with k{F), and then 


\\Pk-9jh < 


4(8l + 79^/^^)(l +vr^)^/V 


(1 - vr^)3/2 


y/2-2 


a 


<F} + 


y/2-2 




a 


Since a = 1225 and a is of the form (14), the above inequality implies that ||pfc—S'j ||2 < (432k(F') + 
l)||Ar|| 2 . By Lemma 5, this inequality leads to Wa^ — fj \\2 < (432k(F') +4 )||Ar||2. □ 
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